SUBSTITUTE SEQUENCE LISTING 



m 



(1) GENERAL INFORMATION : 

(i) APPLICANT: 

(A) NAME: GENSET SA 

(B) STREET: 24 RUE ROYALE 

(C) CITY: PARIS 

(E) COUNTRY: FRANCE 

(F) POSTAL CODE: 75008 



(ii) TITLE OF INVENTION: HUMAN DEFENSIN DEF-X GENE AND DNAc COMPOSITION 
CONTAINING SAME AND DIAGNOSTIC AND THERAPEUTIC APPLICATIONS 



(iii) NUMBER OF SEQUENCES: 6 

(iv) CORRESPONENCE ADDRESS: 

(A) ADDRESSEE: Saliwanchik, Lloyd & Saliwanchik 

(B) STREET: 2421 N.W. 41 st Street, Suite A-l 

(C) CITY: Gainesville 
=. (D) STATE: Florida 

f (E) COUNTRY: USA 

Q (F) ZIP: 32606 
P 

f (v) COMPUTER READABLE FORM: 

Hp {a) MEDIUM TYPE: Floppy disk 

Wi (B) COMPUTER: IBM PC compatible 

\* (C) O PERARAT I NG SYSTEM: PC-DOS/MS-DOS 



(D) SOFTWARE: 



Release #1-0. Version #1.30 (EPO) 



C3 (vi) CURRENT APPLICATION DATA: 

* (A) APPLICATION NUMBER (unassigned) 

^ (B) FILING DATE: OCTOBER 18, 2001 

G ( V ii) PRIORITY APPLICATION DATA: 

I s * (A) APPLICATION NUMBER 09/486,580 

jjT| (B) FILING DATE: FEBRUARY 25, 2000 

^ (viii) ATTORNEY /AGENT INFORMATION: 

¥* (A) NAME : Frank C. Eisenschenk, Ph.D. 

(B) REGISTRATION NUMBER: 45,332 

(C) REFERENCE /DOCKET NUMBER: GEN-100D1 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4415 BASE PAIRS 

(B) TYPE: NUCLEOTIDE 

(C) STRANDEDNESS : DOUBLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME /KEY : Exon 1 

(B) LOCATION: 1836.. 1874 
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GEN-I00D1 



(B) LOCATION: 3394.. 3577 



(B) LOCATION: 4161.. 4380 



(B) LOCATION: 3406.. 3408 

(ix) FEATURE: 

(A) NAME /KEY : Stop CDS 

(B) LOCATION: 4276.-4278 

(ix) FEATURE: 

(A) NAME /KEY : polyadenylat ion site 

(B) LOCATION: 4374.-4379 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

M= 

ACACCATTTG TCTTCATGTA ACCCCATTAG CTATACCCTC TAGTGCAAGG AAACCATAGG 6 0 

O GCCTAGGTCA CACCATGAGG CTGCNCTTAC AAGTTATGCA AAAACTATGG ACTTGGGAGA 12 0 

M 

sjj CCTGTGCGTA ACAACATCAC ACNCCAAATT TAACCAGCTC TCCCCATAAC AGCACGCTCA 18 0 

!=* TGTGTTACTG AGGAAATGCC TGTGGATTGG AGTGTGTTCT GTGTGCAGGA GGCTGGTCCA 24 0 

S=3 GGTTTCACTT CTGCAGGACA CTGGACGTTT CCCAAAACCA GCAGACTTTC CCCACGTGCA 300 

3 CACACACCCC TTCTCATTTT GCCTCTACAT CCATATCCAC TGGGCCCTTC AGGCACCTAC 36 0 

Q TAATGCCCTA GAACCTAAAA CCATCATCTG GGGCCCAGTT CCCTGAATGG CCCTAATCTC 42 0 

r ~ TTCCTCTGCT GGAATGAGTC CAGTGCCCAC TTCCTCCAAC GGTGAAATTG CTGGGCTGCT 48 0 

CO 

P ACAGATCAGG AACTCACTGC TTCCTCATAG GGGCAGC CGA CTTCACTGCT CTGCAACAGC 54 0 

GACCACCCCT AGCGAGGCTT GAGATGCCTC TTGCCTCCTT AAGACTGAGG GAGACGCTTC 600 

AGCTCTCACT CCACTGCCCC AAGTCCTCCA CAGCGCGGTG CCTGCTGCCT TCACACAGAG 66 0 

CTGCAGGGGN AGGTCCTGTG TATCCGGCCT GCTGGACCAG CGCTGTGCAC AACCCTCCCA 72 0 

TGGCAACAGT GGCTGCCCGG CCTGCACACT GGGCTTGGCA ACCTCGCTGT AGGTATTTAT 78 0 

TCCCTCAGGA GTGACTGCAT TCTTTTCCCA TTTCCAGAAA ACTGATGCCA TTTACCTCAC 84 0 

TATGAGGAGG AGGAGGAGGA GGAGGGTGGA GAGTGGTACA TTTTAAAATG TGCACTATTC 900 

TCCCTAGGAC TCCCCCTCAA ATAACCCAGG AGGGACCATA CCAGCTCATT CCTGTGTATC 96 0 

CCAAGCATAN GAGTAATCAT CCCACTCATG CTGAGTGTAT GGTGGCCATT AAGCCTGCCC 102 0 

TGAACTGGCT TTAGAACAAG GTGTTTGAGC ACACAGCACC GTCTTGCTGC CACCTTGGCC 108 0 

CCCTCCCTTG TGAGACCTCT GAGACACATT NAGGTCTCAC CTAAAAATCT CAGGATTTCT 114 0 
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3 GEN 
AGGCCCAAAN CGGTCCTAAA AAATTGTTCA GTCTGAACTC TCTAAGGTCA AGAGAAGAGG 12 00 
TGGTTGCTCC CTCTAAGAAA CCACATGTTG CATGTACATC CTTAATTCCG GAAAGTCCAA 12 60 
CAAACCTGCC CTGCTTAGCA ACACAAGCCG AGGTGGTACT CCTCTCACCC GGGCATTCTC 13 20 
CAACACACCT GTTTGTCCAA ACAGCTTTGA TTTGTTTTTA TAGTTGGACC CCAGGTTCCC 13 80 
AGGAGGCTGG TTCAGGCCAT ATTCCAAATC CTCATCTGTG TGTGAGTGGC ATTCTTAGCC 14 4 0 
TAGCCTCCTT ACAGGGTGGA TACTATGATA CACAGCCAGG CTGTCCCAGT GGCTTTCAAT 15 00 
ATTCTTTTGG TCCAGATAGT TCAGCCTCAG CACCAGTGTA GGCATCACAG GGTCAATTGT 156 0 
CTTAGGAGTC ATGGAGAATT CATAGTTGGT AGCTACCTGG GCCTGGCCAG GGCTGACCAT 1620 
AGACAAGGCA TCC CTCTGTG AACTCCTATT TTAATGCCAG CTTCCCAACA AATTTCTCAA 1680 
CTGCT CTTAC CAGCAGGTAT TTAAACTACT CAATAGAAAG TAACCCTGAA AATTAGGACA 174 0 

CCTGTTCCCA AAAGACCCTT AAATAGGGGA AGTCCTTTCN CTGCTTGTGC ACAGCTGCTG 18 00 

§=* ATGTGGCAAC ATGAGGCCTG GGACAGGGGA CTGTCCTCTG CCCACTCTGG TAGCCTCACG 186 0 

o 

^ TAGCTTAACA ATCTGTCAGT AATACAATAC AAAACTTAAA CTTTCATACT GCGGTTCCAC 192 0 

,§? CCAGGAAGCT GTGTTCCCAA TCTGACCCGT GATTATGGGG CCACCTCAGA GGGNACCCAG 198 0 

TGAGGGAATA TTTTGCCATC TGGGACTGTT GGTTGCTGGG GGCAGTGGCT ATGAGCTCAG 2 04 0 

05 TTAATAAACT CAAGCAGTTT CCTTCCAAAC ACACATGTCC TACTTAACGT GTCCAACAGA 2100 

n 

GATGATCATA CTCATANGCT GCTAAAACAT TANTTTTATT TTGAGAAAAG TCTATTCATG 216 0 

TTCTTGGCCC ATGGAGTTTT CATTTNATTA NTTTATTTAT TTTGCAGAGA TGGAGTCTCA 222 0 

Q. 

P CTATGTTGCT CAAGCTGGTC TCCAACTCCT GGGCTCAAGC GATCTTCCTA CTTTGGCCTT 22 8 0 

CO TGAAAGCG CT GAGATTGCCT GTGTGAGCCA TCATGGGGGC TCACTGGCCC ACTGATTAAT 2 34 0 

o 

1^ CAGATTAATT GTTTTTTGCT ATTGAANTTG TTTGACTTCC TTGTATATTC GGATATTTAC 24 0 0 

CCATTCTAAC ACGTAGGGTT TGCAAATATT TTCTCTCATG TTCTGTGTTG CCTTTTCACT 246 0 

CAGTTGATGG TTTCCTTTGC TGTGCAGGTG CTTTAGTGTT CAACGCAGCC CCGCTTGTCT 2 52 0 

ATTTTCCATT TTATTGCCTG TCCCTTTGAT GTCATAGCCA AGAAATAATT GCCCAGATTA 258 0 

ATGTCAAAAA GCTTTATCCC TATATATTCT TCTAGTAGTT TATGGTTTCA GATCTTATGT 264 0 

TTAGGTCTTC AATCCATTGA GTTGATTTTT GTATGTGGTA TAAGAAAAAA GACCACATGT 2 70 0 

ATACATATCT CAAATTCTAA GGTAGTATAT ATTAGACACA TACAATGTGT CTATTTACAC 276 0 

ACATTGAGCT GAAAATAATA AACATATTTT TATCTTTCAA TCAACTCTAT CTCTATCTCA 282 0 

CTGAACTTGT TTCACCTATA GCCTGATGAG GTTGCTGTCC TCTCTACCCC AGCTCCTATA 2 88 0 

GGAGACTGCT CATCCCCTAA CCTCAAAAAC CCCTTCATGA GGGTGATAAT GCCCTTGAAT 2 94 0 

CCTGCAATGA ATTAGTTCTC TACTACAGTG GAATTCAGGT CTGTTATGAG GGTCTGGATC 3 000 



yi 
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TCTGAAGAGA AGAGCTCTCA TTTTCAGAAA ATAAGCAGGA TTTATTCCCT GAAATTACTG 
AATTAAATCA CTGTTTCGAT TACTTTTTGC AATATTAAAA GTAAATATTT AAACAGGTAA 
AAACAGAAAT AATGGTAGGG TCCTTATCAT CACCGTGAAT TCCAAGCTAG CATAGACACT 
AAACCTAGAG ATTCACACTA GAATGAAAGC TGGGAGAGCA GAGGAGTCTC AGAAGGATGT 
GGAGGCCAAT GGACACCTGC AACCTCTCCA ACGAAATGCC TACCTCCTCT CACTGCAGCA 
TCCATCTCTG AGCCTTCTCG CAGCAGAGCT ATAAATTCAG CCTGGCTCCT CCGTTCCCAC 
ACATCCACTC CTGCTCTCCC TCCTCTCCTC CAGGTGACTA CAGTTATGAG GACCCTCACC 
CTCCTCTCTG CCTTTCTCCT GGTGGCCCTT CAGGCCTGGG CAGAGCCGCT CCAGGCAAGA 
GCTCATGAGA TGCCAGCCCA GAAGCAGCCT CCAGCAGATG ACCAGGATGT GGTCATTTAC 
TTTTCAGGAG ATGACAGCTG CTCTCTTCAG GTTCCAGGTG AGAGATGCCA GCATGCAGAG 
CTACAGACTA GACAGAAGGA CAGGAGACAG GCTCTGGAAT TGGATCTCAG TGGCAGATGT 
CACTTAGGTG GCTATACTTA ACATCTCTGG TCCTGGATTT TCTCATATCT AAATGGAATA 
GAGAACCAAA GAAATCTAAG AGATTTTTCT TTCTCCAAAA ACTTGATTCC AAGATATGAC 
TGTGAAATTC ACTAGATTTA AGATATAAGG AGATGCTACC TAGTTCCTTC TGGAGCCAGA 
CAAACAAGCT TAAGTATATA GGAAAATATT TCACCCTGTC TATATAGGAG GTTTTAGAAC 
CTGGAGAGGA GCCTAAGAAT GTGTTCAGGT GTGTGTGTGA TGGGCAGGAA TGCAGAAAAG 
TGAAGCAAAG GAGAATGAGT CTCGAATCCT GTGTGACCAG CACTGCTCTG TGTATTTATT 
CCTATTGACT GAGATTGTTT GTGCTACCGG CTGTAATACA GCCAACATCA CTCATCAGCC 
AACATGTGAC TTCTCCAAGA TTCCCTTTAC CACCCACTGC TGNACCCCGT ACTCAGTTTC 
TGATGCTCTC TCTGGGTCCC CAGGCTCAAC AAAGGGCTTG ATCTGCCATT GCAGAGTACT 
ATACTGCATT TTTGGAGAAC ATCTTGGTGG GACCTGCTTC ATCCTTGGTG AACGCTACCC 
AATCTGCTGC TACTAAGCTT GCAGACTAGA GAAAAAGAGT TCATAATTTT CTTTGAGCAT 
TAAAGGGAAT TGTTATTCTT ATACCTTGTC CTCGATTTCC TGTCCTCATC CCAAATAAAT 
ACTTGGTAAC ATGATTTCCG GGTTTTTTTT TTTTT 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTIC: 

(A) LENGTH: 453 BASE PAIRS 

(B) TYPE: NUCLEOTIDE 

(C) STRANDEDNESS : DOUBLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



3060 
3120 
3180 
3240 
3300 
3360 
3420 

3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
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4200 
4260 
4320 
4380 
4415 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CTCTGCCCAC TCTGGTAGCC TCACGTAGCT TAACAATCTG TGACTACAGT T ATG AGG 

Met Arg 
1 

ACC CTC ACC CTC CTC TCT GCC TTT CTC CTG GTG GCC CTT CAG GCC TGG 
Thr Leu Thr Leu Leu Ser Ala Phe Leu Leu Val Ala Leu Gin Ala Trp 
5 10 15 

GCA GAG CCG CTC CAG GCA AGA GCT CAT GAG ATG CCA GCC CAG AAG CAG 
Ala Glu Pro Leu Gin Ala Arg Ala His Glu Met Pro Ala Gin Lys Gin 
20 25 30 

CCT CCA GCA GAT GAC CAG GAT GTG GTC ATT TAC TTT TCA GGA GAT GAC 
Pro Pro Ala Asp Asp Gin Asp Val Val He Tyr Phe Ser Gly Asp Asp 
35 40 45 50 

AGC TGC TCT CTT CAG GTT CCA GGC TCA ACA AAG GGC TTG ATC TGC CAT 
Ser Cys Ser Leu Gin Val Pro Gly Ser Thr Lys Gly Leu He Cys His 
!** ' 55 60 65 

o 

5 TGC AGA GTA CTA TAC TGC ATT TTT GGA GAA CAT CTT GGT GGG ACC TGC 

O Cys Arg Val Leu Tyr Cys He Phe Gly Glu His Leu Gly Gly Thr Cys 
J£ 70 75 80 

f , : TTC ATC CTT GGT GAA CGC TAC CCA ATC TGC TGC TAC TAA GCTTGCAGAC 

Phe He Leu Gly Glu Arg Tyr Pro He Cys Cys Tyr * 
C3 85 90 95 

TAGAGAAAAA GAGTTCATAA TTTTCTTTGA GCATTAAAGG GAATTGTTAT TCTTATACCT 

!=* TGTCCTCGAT TTCCTGTCCT CATCCCAAAT AAATACTTGG TAACATG 

Q 
\* 
CO 
O 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE : PROTEIN 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM : Homo sapiens 

(ix) FEATURE: 

(A) NAME /KEY: SIGNAL PEPTIDE 

(B) LOCATION: 1 . . 19 

(ix) FEATURE: 

(A) NAME/KEY: PRO REGION 

(B) LOCATION: 20.. 63 



S:'-SHARE\Sequences\GEN\IOODl\SequenceList.doc/DSB/jaj 



(B) LOCATION: 64 . . 94 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Arg Thr Leu Thr Leu Leu Ser Ala Phe Leu Leu Val Ala Leu Gin 
15 10 15 

Ala Trp Ala Glu Pro Leu Gin Ala Arg Ala His Glu Met Pro Ala Gin 
20 25 30 

Lys Gin Pro Pro Ala Asp Asp Gin Asp Val Val He Tyr Phe Ser Gly 
35 40 45 

Asp Asp Ser Cys Ser Leu Gin Val Pro Gly Ser Thr Lys Gly Leu He 
50 55 60 

Cys His Cys Arg Val Leu Tyr Cys He Phe Gly Glu His Leu Gly Gly 
65 70 75 80 

Thr Cys Phe He Leu Gly Glu Arg Tyr Pro He Cys Cys Tyr 
85 90 

o 
o 

4* (2) INFORMATION FOR SEQ ID NO: 4: 

m 

jLj. (i) SEQUENCE CHARACTERISTICS: 

* m (A) LENGTH: 19 AMINO ACIDS 

t& (B) TYPE: AMINO ACID 

p (C) STRANDEDNESS : SINGLE 

s (D) TOPOLOGY: LINEAR 

•IT (ii) MOLECULE TYPE: SIGNAL PEPTIDE 
O 

|=* (vi) ORIGINAL SOURCE: 

p (A) ORGANISM: Homo sapiens 

O (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

M= 

Met Arg Thr Leu Thr Leu Leu Ser Ala Phe Leu Leu Val Ala Leu Gin 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PRO REGION 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Glu Pro Leu Gin Ala Arg Ala His Glu Met Pro Ala Gin Lys Gin Pro 
15 10 15 

Pro Ala Asp Asp Gin Asp Val Val lie Tyr Phe Ser Gly Asp Asp Ser 



Cys Ser Leu Gin Val Pro Gly Ser Thr Lys Gly Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: MATURE PEPTIDE 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



Iff (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

W lie Cys His Cys Arg Val Leu Tyr Cys He Phe Gly Glu His Leu Gly 

O 1 5 10 15 

j , Gly Thr Cys Phe He Leu Gly Glu Arg Tyr Pro He Cys Cys Tyr 
? " 20 25 30 



o 
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